Self-similarity in complete genomes

نویسندگان

  • Ta-Yuan Chen
  • Li-Ching Hsieh
چکیده

Recently it was reported that in terms of the global feature of frequency distributions of short words, whole genomes are equivalent to random sequences of a much shorter length which, for given word length, is genome independent, or universal. For two-letter words the universal equivalent random-sequence length was found to be about 300 bases. Here we show that as a rule whole genomes are highly self-similar in exhibiting this universal property. For two-letter words, with few exceptions any segment more than ∼1000 bases from any genome possesses the same universal property. The universality and self-similarity taken together suggest genomes are close to being selforganized critical systems. We show that a model in which genomes grow by maximally stochastic segmental duplication will generate sequences that share the univrersal and self-similar properties of genomes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dimensions of fractals related to languages defined by tagged strings in complete genomes

A representation of frequency of strings of length K in complete genomes of many organisms in a square has led to seemingly self-similar patterns when K increases. These patterns are caused by under-represented strings with a certain “tag”-string and they define some fractals in the K → ∞ limit. The Box and Hausdorff dimensions of the limit set are discussed. Although the method proposed by Mau...

متن کامل

Similarity Analysis of DNA Sequences based on the LZ Complexity

Motivation. Almost all methods for similarity analysis and phylogenetic inference are usually based on the multiple alignment of sequences or the invariants of sequences. But the former is not useful to all types of data, e.g. the whole genome comparisons, while the latter is accompanied by the complex calculation. The motivation of this paper is to introduce a new approach for similarity analy...

متن کامل

Uncertainty Modeling of a Group Tourism Recommendation System Based on Pearson Similarity Criteria, Bayesian Network and Self-Organizing Map Clustering Algorithm

Group tourism is one of the most important tasks in tourist recommender systems. These systems, despite of the potential contradictions among the group's tastes, seek to provide joint suggestions to all members of the group, and propose recommendations that would allow the satisfaction of a group of users rather than individual user satisfaction. Another issue that has received less attention i...

متن کامل

Data set of phylogenetic analysis inferred based on the complete genomes of the family Nodaviridae

In this article, nine complete genomes of viruses from the genus Alphanodavirus and Betanodavirus (Family Nodaviridae) were comparatively analyzed and the data of their evolutionary origins and relatedness are reported. The nucleotide sequence alignment of the complete genomes from all species and their deduced evolutionary relationships are presented. High sequence similarity within the genus ...

متن کامل

Acquired Antimicrobial Resistance Genes of Escherichia coli Obtained from Nigeria: In silico Genome Analysis

Background: Antimicrobial resistance is a global problem with enormous public health and economic impact. This study was carried out to get an overview of acquired antimicrobial resistance gene sequences in the genomes of Escherichia coli isolated from different food sources and the environment in Nigeria. Methods: To determine the acquired antimicrobial-resistant genes prevalence, genome asse...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004